# Objectives
    - Filter space of achieved goals for HER with interaction states
    - Change the rate at which data is sampled to favor interaction acheiving states

# Install
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- pip install scipy matplotlib tianshou hydra-core array2gif imageio opencv-python gym-minigrid==1.2.2 psutil wandb
- pip install setuptools==65.5.0 pip==21  # gym 0.21 installation is broken with more recent versions
- clone minibehavior: git@github.com:StanfordVL/mini_behavior.git (use private repo though)
- cd mini-behavior
- git checkout -b ihrl
- git pull origin ihrl
- pip install wheel==0.38.0
- pip install -e .
- pip install wandb
- git clone git@github.com:/robosuite-pushing.git
- cd robosuite
- pip install -r requirements.txt
- pip install -r requirements-extra.txt

# Run
- python train_RL.py to run the default parameters. Modify parameters in configs/config.yaml. If running debugging, use python train_RL.py +env=small
- debug by printing out: rewards, sampling rates, returns, achieved goal, desired goal (see line in collector)
- parameters to change: use_her, target_goal_epsilon: 0.7, target_goal_shaping: -1.0, target_graph_epsilon: 0.01, adaptive_radius_rate: -1.0

# Code structure:
    -train_RL.py entry point, calls initializers for policies, model and data. 
    -Policy.gcrl_trainer.py dictates main loop: collects then updates the RL function
    -Policy.goal_policy wraps a tianshou policy and performs update, process_fn and other operations
    -Causal.ac_dynamics.py wraps an actual cause model , used to identify interactions and return them
    -Causal.ac_infer... is the ac inference module
    -State.collector.py, .buffer.py modified to store interaction information
    -State.extractor.py converts flat state into factored state
    -State.buffer contains the GC buffers which have modified hindsight and prioritized sampling operators

# data generation
    - /data//gcrl_data/meta_air_obstacles/data_her_filter_form_control_trial1/results/skill_learn/box2d_2024_09_26_15_08_15/policy200.pth
    - /data//gcrl_data/meta_robo_few_eval/data_her_filter_form_control_trial0/results/skill_learn/box2d_2024_09_26_08_50_48/policy100.pth
    - /datastor1//gcrl_data/meta_obstacles_few_eval2/data_her_filter_form_control_trial0/results/skill_learn/box2d_2024_09_25_11_23_19/policy_500.pth